details widget name

Romanian LPC

Chapter details

Sentence splitter, tokenizer, lemmatizer, POS tagger

The Romanian POS Tagger automatically performs four tasks - sentence splitting, tokenizing, pos tagging and lemmatizing. It requiresa  language model for the pos tagger, a language dictionar and a set or rules, used in the redactor rules system.

NP extractor

The Romanian NP Chunker, uses GGS (Graphical Grammar Studio http://sourceforge.net/projects/ggs/), a visual tool for describing grammars. A Romanian grammar has been developed allowing fully recursive NP chunks.

NE recognizer

The Romanian NE recognizer primarily uses the JRC-Names. A secondary NER, based on an ANNIE GATE application, has been customized for Romanian. The GATE application, wrapped as UIMA primitive engine, additionally provides dates, locations, money and percentage NEs.